# **Exercise Session 6**

Complex Pipeline, Dynamic Branch Prediction, Scoreboard/Tomasulo

Advanced Computer Architectures

Politecnico di Milano April 23rd, 2025

Alessandro Verosimile <alessandro.verosimile@polimi.it>





## **Exe: Complex Pipeline**







### Exe: Complex Pipeline

In this problem we will examine the execution of a code segment on the following single-issue out-of-order processor:





#### You can assume that

- All functional units are pipelined
- ALU operations take 1 cycle
- Memory operations take 2 cycles (includes time in ALU)
- Floating-point add instructions take 2 cycles
- Floating-point multiply instructions take 3 cycles
- There is no register renaming. No forwarding
- Instructions are fetched, decoded and issued in order
- The ISSUE stage is a buffer of unlimited length that holds instructions waiting to start execution
- An instruction will only enter the issue stage if it does not cause a WAR or WAW hazard
- Only one instruction can be issued at a time, and in the case multiple instructions are ready, the oldest one will go first
- Program Counter calculation for branches and jumps has been anticipated in the ISSUE stage







#### You can assume that

- All functional units are pipelined
- ALU operations take 1 cycle



- Floating-point add instructions take 2 cycles
- Floating-point multiply instructions take 3 cycles
- There is no register renaming. No forwarding
- Instructions are fetched, decoded and issued in order
- The ISSUE stage is a buffer of unlimited length that holds instructions waiting to start execution
- An instruction will only enter the issue stage if it does not cause a WAR or WAW hazard
- Only one instruction can be issued at a time, and in the case multiple instructions are ready, the oldest one will go first
- Program Counter calculation for branches and jumps has been anticipated in the ISSUE stage







## Exe Complex Pipeline: the Code

```
LOOP:I1: LD F1, 0 (R2)
```

I2: MULTD F2, F1, F1

I3: ADDD F3, F1, F5

I4: MULTD F2, F3, F1

I5: SUBD F5, F1, F5

I6: SUBI R2, R2, 4

I7: BNEZ R2, LOOP



MEM OP: 2 cycles

FP ADD: 2 cycles

FP MULT: 3 cycles





### Exe Complex Pipeline: the Conflicts

LOOP: I1: LD (F1) 0 (R2)

I2: MULTD F2 F1 F1

I3: ADDD F3 F1 F5

I4: MULTD F2 F3 F1

I5: SUBD (F5) (F1) F5

I6: SUBI R2 R2, 4

17: BNEZ(R2) LOOP

**RAW F1 I1-I2** 

**RAW F1 I1-I3** 

**RAW F1 I1-I4** 

**RAW F1 I1-I5** 

RAW F3 I3-I4

**RAW R2 I6-I7** 

WAW F2 I2-I4

**WAR F5 I3-I5** 

**WAR R2 I1-I6** 

**CNTRL** 

ALU OP: 1 cycle

MEM OP: 2 cycles

FP ADD: 2 cycles

FP MULT: 3 cycles





### Exe Complex Pipeline: the Arch.

LOOP: I1: LD (F1) 0 (R2)

I2: MULTD F2 F1 F1

I3: ADDD F3 F1 F5

I4: MULTD F2 F3 F1

I5: SUBD (F5) (F1) F5

I6: SUBI R2 R2, 4

I7: BNEZ(R2) LOOP

**RAW F1 I1-I2** 

**RAW F1 I1-I3** 

**RAW F1 I1-I4** 

**RAW F1 I1-I5** 

RAW F3 I3-I4

**RAW R2 I6-I7** 

WAW F2 I2-I4

**WAR F5 I3-I5** 

**WAR R2 I1-I6** 

**CNTRL** 

ALU OP: 1 cycle

MEM OP: 2 cycles

FP ADD: 2 cycles

FP MULT: 3 cycles





CC 0

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   | Instruction          | C1 | C2 | C3 | C4 | C5 | C6 | <b>C7</b> | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 | C17 | C18 | C19 | Notes                                        |
|---|----------------------|----|----|----|----|----|----|-----------|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|----------------------------------------------|
| 1 | LOOP:<br>LD F1,0(R2) |    |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     |                                              |
| 2 | MULTD F2,F1,F1       |    |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I2                                 |
| 3 | ADDD F3,F1,F5        |    |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I3                                 |
| 4 | MULTD F2,F3,F1       |    |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I4<br>RAW F3 I3-I4<br>WAW F2 I2-I4 |
| 5 | SUBD F5,F1,F5        |    |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I5<br>WAR F5 I3-I5                 |
| 6 | SUBI R2,R2,4         |    |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | WAR R2 I1-I6                                 |
| 7 | BNEZ R2, LOOP        |    |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | RAW R2 I6-I7                                 |
| 8 | (New<br>Instruction) |    |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | CNTRL                                        |





CC<sub>1</sub>

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   |                      |     |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | 1            |
|---|----------------------|-----|----|----|----|----|----|-----------|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|--------------|
|   | Instruction          | C1  | C2 | C3 | C4 | C5 | C6 | <b>C7</b> | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 | C17 | C18 | C19 | Notes        |
| 1 | LOOP:                | F   |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     | ·   |     | ·   |              |
|   | LD F1,0(R2)          | i . |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     |              |
| 2 | MULTD F2,F1,F1       |     |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I2 |
| 3 | ADDD F3,F1,F5        |     |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I3 |
|   |                      |     |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I4 |
| 4 | MULTD F2,F3,F1       |     |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | RAW F3 13-14 |
|   |                      |     |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | WAW F2 I2-I4 |
|   |                      |     |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I5 |
| 5 | SUBD F5,F1,F5        |     |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | WAR F5 13-15 |
| 6 | SUBI R2,R2,4         |     |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | WAR R2 I1-I6 |
| 7 | BNEZ R2, LOOP        |     |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | RAW R2 I6-I7 |
| 8 | (New<br>Instruction) |     |    |    |    |    |    |           |    |    |     |     | -   |     |     |     |     |     |     |     | CNTRL        |



CC 2

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   | Instruction          | C1 | C2 | С3 | C4 | C5 | C6 | <b>C7</b> | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 | C17 | C18 | C19 | Notes                                        |
|---|----------------------|----|----|----|----|----|----|-----------|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|----------------------------------------------|
| 1 | LOOP:<br>LD F1,0(R2) | F  | D  |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     |                                              |
| 2 | MULTD F2,F1,F1       |    | F  |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I2                                 |
| 3 | ADDD F3,F1,F5        |    |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I3                                 |
| 4 | MULTD F2,F3,F1       |    |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I4<br>RAW F3 I3-I4<br>WAW F2 I2-I4 |
| 5 | SUBD F5,F1,F5        |    |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I5<br>WAR F5 I3-I5                 |
| 6 | SUBI R2,R2,4         |    |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | WAR R2 I1-I6                                 |
| 7 | BNEZ R2, LOOP        |    |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | RAW R2 I6-I7                                 |
| 8 | (New<br>Instruction) |    |    |    |    |    |    |           |    |    |     |     |     |     |     |     |     |     |     |     | CNTRL                                        |



CC 3

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   | Instruction          | C4      | C2 | 62       | C4 | <b>C</b> E | C6       | C7 | <u> </u> | -  | C40 | 644      | C12 | 040 | C44 | 045 | C46 | 047      | 046 | 640 | Notes                                        |
|---|----------------------|---------|----|----------|----|------------|----------|----|----------|----|-----|----------|-----|-----|-----|-----|-----|----------|-----|-----|----------------------------------------------|
| 1 | LOOP:<br>LD F1,0(R2) | C1<br>F | D  | C3<br>IS | 64 | C5         | <u> </u> | C7 | C8       | C9 | C10 | <u> </u> | C12 | U13 | C14 | C15 | C16 | <u> </u> | C18 | C19 | Notes                                        |
| 2 | MULTD F2,F1,F1       |         | F  | D        |    |            |          |    |          |    |     |          |     |     |     |     |     |          |     |     | RAW F1 I1-I2                                 |
| 3 | ADDD F3,F1,F5        |         |    | F        |    |            |          |    |          |    |     |          |     |     |     |     |     |          |     |     | RAW F1 I1-I3                                 |
| 4 | MULTD F2,F3,F1       |         |    |          |    |            |          |    |          |    |     |          |     |     |     |     |     |          |     |     | RAW F1 I1-I4<br>RAW F3 I3-I4<br>WAW F2 I2-I4 |
| 5 | SUBD F5,F1,F5        |         |    |          |    |            |          |    |          |    |     |          |     |     |     |     |     |          |     |     | RAW F1 I1-I5<br>WAR F5 I3-I5                 |
| 6 | SUBI R2,R2,4         |         |    |          |    |            |          |    |          |    |     |          |     |     |     |     |     |          |     |     | WAR R2 I1-I6                                 |
| 7 | BNEZ R2, LOOP        |         |    |          |    |            |          |    |          |    |     |          |     |     |     |     |     |          |     |     | RAW R2 I6-I7                                 |
| 8 | (New<br>Instruction) |         |    |          |    |            |          |    |          |    |     |          |     |     |     |     |     |          |     |     | CNTRL                                        |



CC 4

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   | Instruction          | C1 | C2 | C3 | C4      | C5 | C6 | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 | C17 | C18 | C19 | Notes                                        |
|---|----------------------|----|----|----|---------|----|----|----|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|----------------------------------------------|
| 1 | LOOP:                | F  | D  | IS | E1      |    |    |    |    |    |     |     |     |     |     |     |     |     |     |     |                                              |
| • | LD F1,0(R2)          | 1  |    | 0  |         |    |    |    |    |    |     |     |     |     |     |     |     |     |     |     |                                              |
| 2 | MULTD F2,F1,F1       |    | F  | D  | IS<br>s |    |    |    |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I2                                 |
| 3 | ADDD F3,F1,F5        |    |    | F  | D       |    |    |    |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I3                                 |
| 4 | MULTD F2,F3,F1       |    |    |    | F       |    |    |    |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I4<br>RAW F3 I3-I4<br>WAW F2 I2-I4 |
| 5 | SUBD F5,F1,F5        |    |    |    |         |    |    |    |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I5<br>WAR F5 I3-I5                 |
| 6 | SUBI R2,R2,4         |    |    |    |         |    |    |    |    |    |     |     |     |     |     |     |     |     |     |     | WAR R2 I1-I6                                 |
| 7 | BNEZ R2, LOOP        |    |    |    |         |    |    |    |    |    |     |     |     |     |     |     |     |     |     |     | RAW R2 I6-I7                                 |
| 8 | (New<br>Instruction) |    |    |    |         |    |    |    |    |    |     |     |     |     |     |     |     |     |     |     | CNTRL                                        |



CC 5

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   | Instruction          | C1 | C2 | C3 | C4      | C5      | C6 | С7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 | C17 | C18 | C19 | Notes                                        |
|---|----------------------|----|----|----|---------|---------|----|----|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|----------------------------------------------|
| 1 | LOOP:<br>LD F1,0(R2) | F  | D  | IS | E1      | E2      |    |    |    |    |     |     |     |     |     |     |     |     |     |     |                                              |
| 2 | MULTD F2,F1,F1       |    | F  | D  | IS<br>s | IS<br>s |    |    |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I2                                 |
| 3 | ADDD F3,F1,F5        |    |    | F  | D       | IS<br>s |    |    |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I3                                 |
| 4 | MULTD F2,F3,F1       |    |    |    | F       | D<br>s  |    |    |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I4<br>RAW F3 I3-I4<br>WAW F2 I2-I4 |
| 5 | SUBD F5,F1,F5        |    |    |    |         | F<br>s  |    |    |    |    |     |     |     |     |     |     |     |     |     |     | RAW F1 I1-I5<br>WAR F5 I3-I5                 |
| 6 | SUBI R2,R2,4         |    |    |    |         |         |    |    |    |    |     |     |     |     |     |     |     |     |     |     | WAR R2 I1-I6                                 |
| 7 | BNEZ R2, LOOP        |    |    |    |         |         |    |    |    |    |     |     |     |     |     |     |     |     |     |     | RAW R2 I6-I7                                 |
| 8 | (New<br>Instruction) |    |    |    |         |         |    |    |    |    |     |     |     |     |     |     |     |     |     |     | CNTRL                                        |



CC 6

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   | Instruction          | C1 | C2 | C3 | C4      | C5      | C6      | C7 | C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 | C17   | C18 | C19 | Notes                                              |
|---|----------------------|----|----|----|---------|---------|---------|----|----|----|-----|-----|-----|-----|-----|-----|-----|-------|-----|-----|----------------------------------------------------|
| 1 | LOOP:<br>LD F1,0(R2) | F  | D  | IS | E1      | E2      | W       |    |    |    |     |     |     |     |     |     |     | • • • |     |     |                                                    |
| 2 | MULTD F2,F1,F1       |    | F  | D  | IS<br>s | IS<br>s | IS      |    |    |    |     |     |     |     |     |     |     |       |     |     | -RAWF1-I1-I2-                                      |
| 3 | ADDD F3,F1,F5        |    |    | F  | D       | IS<br>s | IS<br>s |    |    |    |     |     |     |     |     |     |     |       |     |     | -RAW F1 11 13-                                     |
| 4 | MULTD F2,F3,F1       |    |    |    | F       | D<br>s  | D<br>s  |    |    |    |     |     |     |     |     |     |     |       |     |     | RAW F1   11-14<br>RAW F3   13-14<br>WAW F2   12-14 |
| 5 | SUBD F5,F1,F5        |    |    |    |         | F<br>s  | F<br>s  |    |    |    |     |     |     |     |     |     |     |       |     |     | -RAW F1  1  5-<br>WAR F5  3- 5                     |
| 6 | SUBI R2,R2,4         |    |    |    |         |         |         |    |    |    |     |     |     |     |     |     |     |       |     |     | -WAR R2 I1-I6-                                     |
| 7 | BNEZ R2, LOOP        |    |    |    |         |         |         |    |    |    |     |     |     |     |     |     |     |       |     |     | RAW R2 I6-I7                                       |
| 8 | (New<br>Instruction) |    |    |    |         |         |         |    |    |    |     |     |     |     |     |     |     |       |     |     | CNTRL                                              |



CC 7

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   | Instruction          | C1 | C2 | C3 | C4      | C5      | C6      | <b>C7</b> | C8 | C9   | C10 | C11 | C12      | C13 | C14 | C15 | C16 | C17 | C18  | C19 | Notes                                        |
|---|----------------------|----|----|----|---------|---------|---------|-----------|----|------|-----|-----|----------|-----|-----|-----|-----|-----|------|-----|----------------------------------------------|
| 1 | LOOP:<br>LD F1,0(R2) | F  | D  | IS | E1      | E2      | W       | <u> </u>  |    | - 50 |     | •   | <u> </u> | 0.0 |     |     | 0.0 |     | 0.10 |     | 7,000                                        |
| 2 | MULTD F2,F1,F1       |    | F  | D  | IS<br>s | IS<br>s | IS      | E1        |    |      |     |     |          |     |     |     |     |     |      |     | -RAW F1-I1-I2-                               |
| 3 | ADDD F3,F1,F5        |    |    | F  | D       | IS<br>s | IS<br>s | IS        |    |      |     |     |          |     |     |     |     |     |      |     | <del>-RAW F1  1  3-</del>                    |
| 4 | MULTD F2,F3,F1       |    |    |    | F       | D<br>s  | D<br>s  | D<br>s    |    |      |     |     |          |     |     |     |     |     |      |     | RAW F1 I1-I4<br>RAW F3 I3-I4<br>WAW F2 I2-I4 |
| 5 | SUBD F5,F1,F5        |    |    |    |         | F<br>s  | F<br>s  | F<br>s    |    |      |     |     |          |     |     |     |     |     |      |     | -RAW F1-I1-I5-<br>WAR F5 I3-I5               |
| 6 | SUBI R2,R2,4         |    |    |    |         |         |         |           |    |      |     |     |          |     |     |     |     |     |      |     | -WAR R2 I1-I6-                               |
| 7 | BNEZ R2, LOOP        |    |    |    |         |         |         |           |    |      |     |     |          |     |     |     |     |     |      |     | RAW R2 I6-I7                                 |
| 8 | (New<br>Instruction) |    |    |    |         |         |         |           |    |      |     |     |          |     |     |     |     |     |      |     | CNTRL                                        |



CC 9

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   |                      |    |    |    | 1       |         |         |           |        | _  |     |     |     |     |     |     |     |     |     |     | 1                                    |
|---|----------------------|----|----|----|---------|---------|---------|-----------|--------|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|--------------------------------------|
|   | Instruction          | C1 | C2 | C3 | C4      | C5      | C6      | <b>C7</b> | C8     | C9 | C10 | C11 | C12 | C13 | C14 | C15 | C16 | C17 | C18 | C19 | Notes                                |
| 1 | LOOP:<br>LD F1,0(R2) | F  | D  | IS | E1      | E2      | W       |           |        |    |     |     |     |     |     |     |     |     |     |     |                                      |
| 2 | MULTD F2,F1,F1       |    | F  | D  | IS<br>s | IS<br>s | IS      | E1        | E2     | E3 |     |     |     |     |     |     |     |     |     |     | -RAW F1  11- 2-                      |
| 3 | ADDD F3,F1,F5        |    |    | F  | D       | IS<br>s | IS<br>s | IS        | E1     | E2 |     |     |     |     |     |     |     |     |     |     | RAW F1-I1-I3-<br>Structural on WB    |
| 4 | MULTD F2,F3,F1       |    |    |    | F       | D<br>s  | D<br>s  | D<br>s    | D<br>s | D  |     |     |     |     |     |     |     |     |     |     | RAW F1   11- 4<br>RAW F3   13- 4<br> |
| 5 | SUBD F5,F1,F5        |    |    |    |         | F<br>s  | F<br>s  | F         | F<br>s | F  |     |     |     |     |     |     |     |     |     |     | RAW F1 I1 I5<br>WAR F5 I3-I5         |
| 6 | SUBI R2,R2,4         |    |    |    |         |         |         |           |        |    |     |     |     |     |     |     |     |     |     |     | WAR R2 I1-16                         |
| 7 | BNEZ R2, LOOP        |    |    |    |         |         |         |           |        |    |     |     |     |     |     |     |     |     |     |     | RAW R2 I6-I7                         |
| 8 | (New<br>Instruction) |    |    |    |         |         |         |           |        |    |     |     |     |     |     |     |     |     |     |     | CNTRL                                |



CC 10

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   | Instruction          | C1 | C2 | C3 | C4      | C5      | C6      | C7     | C8     | C9 | C10     | C11 | C12 | C13 | C14 | C15 | C16 | C17 | C18 | C19 | Notes                              |
|---|----------------------|----|----|----|---------|---------|---------|--------|--------|----|---------|-----|-----|-----|-----|-----|-----|-----|-----|-----|------------------------------------|
| 1 | LOOP:<br>LD F1,0(R2) | F  | D  | IS | E1      | E2      | W       |        |        |    | -       |     |     |     |     | -   |     |     |     | -   |                                    |
| 2 | MULTD F2,F1,F1       |    | F  | D  | IS<br>s | IS<br>s | IS      | E1     | E2     | E3 | W       |     |     |     |     |     |     |     |     |     | -RAW F1-I1-I2-                     |
| 3 | ADDD F3,F1,F5        |    |    | F  | D       | IS<br>s | IS<br>s | IS     | E1     | E2 | E2<br>s |     |     |     |     |     |     |     |     |     | RAW F1-I1-I3-<br>Structural on WB  |
| 4 | MULTD F2,F3,F1       |    |    |    | F       | D<br>s  | D<br>s  | D<br>s | D<br>s | D  | IS<br>s |     |     |     |     |     |     |     |     |     | RAW F1   1- 4<br>RAW F3   3- 4<br> |
| 5 | SUBD F5,F1,F5        |    |    |    |         | F<br>s  | F<br>s  | F      | F<br>s | F  | D       |     |     |     |     |     |     |     |     |     | -RAW F1  1  5-<br>WAR F5  3- 5     |
| 6 | SUBI R2,R2,4         |    |    |    |         |         |         |        |        |    | F       |     |     |     |     |     |     |     |     |     | -WAR R2 I1-16                      |
| 7 | BNEZ R2, LOOP        |    |    |    |         |         |         |        |        |    |         |     |     |     |     |     |     |     |     |     | RAW R2 I6-I7                       |
| 8 | (New<br>Instruction) |    |    |    |         |         |         |        |        |    |         |     |     |     |     |     |     |     |     |     | CNTRL                              |



CC 11

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   | 1                    |    |    |    | _       |         |         | _      |        | _  |         |         |     |     |     |     |     |     |     |     | 1                                                         |
|---|----------------------|----|----|----|---------|---------|---------|--------|--------|----|---------|---------|-----|-----|-----|-----|-----|-----|-----|-----|-----------------------------------------------------------|
|   | Instruction          | C1 | C2 | C3 | C4      | C5      | C6      | C7     | C8     | C9 | C10     | C11     | C12 | C13 | C14 | C15 | C16 | C17 | C18 | C19 | Notes                                                     |
| 1 | LOOP:<br>LD F1,0(R2) | F  | D  | IS | E1      | E2      | W       |        |        |    |         |         |     |     |     |     |     |     |     |     |                                                           |
| 2 | MULTD F2,F1,F1       |    | F  | D  | IS<br>s | IS<br>s | IS      | E1     | E2     | E3 | W       |         |     |     |     |     |     |     |     |     | -RAW F1  1- 2-                                            |
| 3 | ADDD F3,F1,F5        |    |    | F  | D       | IS<br>s | IS<br>s | IS     | E1     | E2 | E2<br>s | W       |     |     |     |     |     |     |     |     | RAW F1-I1-I3-<br>Structural on WB                         |
| 4 | MULTD F2,F3,F1       |    |    |    | F       | D<br>s  | D<br>s  | D<br>s | D<br>s | D  | IS<br>s | IS      |     |     |     |     |     |     |     |     | RAW F1   11-  14-<br>RAW F3   3   14-<br>WAW F2   2   14- |
| 5 | SUBD F5,F1,F5        |    |    |    |         | F<br>s  | F<br>s  | F      | F<br>s | F  | D       | IS<br>s |     |     |     |     |     |     |     |     | -RAW F1  1  5<br>-WAR F5  3  5                            |
| 6 | SUBI R2,R2,4         |    |    |    |         |         |         |        |        |    | F       | D       |     |     |     |     |     |     |     |     | -WAR R2 I1-I6                                             |
| 7 | BNEZ R2, LOOP        |    |    |    |         |         |         |        |        |    |         | F       |     |     |     |     |     |     |     |     | RAW R2 I6-I7                                              |
| 8 | (New<br>Instruction) |    |    |    |         |         |         |        |        |    |         |         |     |     |     |     |     |     |     |     | CNTRL                                                     |



CC 12

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   | Instruction          | C1 | C2 | C3 | C4      | C5      | C6      | C7     | C8     | C9 | C10     | C11     | C12     | C13 | C14 | C15 | C16 | C17 | C18 | C19 | Notes                                             |
|---|----------------------|----|----|----|---------|---------|---------|--------|--------|----|---------|---------|---------|-----|-----|-----|-----|-----|-----|-----|---------------------------------------------------|
| 1 | LOOP:<br>LD F1,0(R2) | F  | D  | IS | E1      | E2      | W       |        |        |    |         |         |         |     |     |     |     |     |     |     |                                                   |
| 2 | MULTD F2,F1,F1       |    | F  | D  | IS<br>s | IS<br>s | IS      | E1     | E2     | E3 | W       |         |         |     |     |     |     |     |     |     | -RAW F1  11- 2-                                   |
| 3 | ADDD F3,F1,F5        |    |    | F  | D       | IS<br>s | IS<br>s | IS     | E1     | E2 | E2<br>s | W       |         |     |     |     |     |     |     |     | RAW F1-I1-I3-<br>Structural on WB                 |
| 4 | MULTD F2,F3,F1       |    |    |    | F       | D<br>s  | D<br>s  | D<br>s | D<br>s | D  | IS<br>s | IS      | E1      |     |     |     |     |     |     |     | RAW F1   1-14<br>RAW F3   3   4<br>WAW F2   2   4 |
| 5 | SUBD F5,F1,F5        |    |    |    |         | F<br>s  | F<br>s  | F      | F<br>s | F  | D       | IS<br>s | IS      |     |     |     |     |     |     |     | -RAW F1  1  5-<br>-WAR F5  3  5-                  |
| 6 | SUBI R2,R2,4         |    |    |    |         |         |         |        |        |    | F       | D       | IS<br>s |     |     |     |     |     |     |     | -WAR R2 I1-I6-                                    |
| 7 | BNEZ R2, LOOP        |    |    |    |         |         |         |        |        |    |         | F       | D       |     |     |     |     |     |     |     | RAW R2 I6-I7                                      |
| 8 | (New<br>Instruction) |    |    |    |         |         |         |        |        |    |         |         | F<br>s  |     |     |     |     |     |     |     | CNTRL                                             |



CC 14

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   | Instruction          | C1 | C2 | C3 | C4      | C5      | C6      | C7     | C8     | C9 | C10     | C11     | C12     | C13     | C14     | C15 | C16 | C17 | C18 | C19 | Notes                                             |
|---|----------------------|----|----|----|---------|---------|---------|--------|--------|----|---------|---------|---------|---------|---------|-----|-----|-----|-----|-----|---------------------------------------------------|
| 1 | LOOP:<br>LD F1,0(R2) | F  | D  | IS | E1      | E2      | W       |        |        |    |         |         |         |         |         |     |     |     |     |     |                                                   |
| 2 | MULTD F2,F1,F1       |    | F  | D  | IS<br>s | IS<br>s | IS      | E1     | E2     | E3 | W       |         |         |         |         |     |     |     |     |     | -RAWF1-H1-I2-                                     |
| 3 | ADDD F3,F1,F5        |    |    | F  | D       | IS<br>s | IS<br>s | IS     | E1     | E2 | E2<br>s | W       |         |         |         |     |     |     |     |     | RAW F1 I1 I3 Structural on WB                     |
| 4 | MULTD F2,F3,F1       |    |    |    | F       | D<br>s  | D<br>s  | D<br>s | D<br>s | D  | IS<br>s | IS      | E1      | E2      | E3      |     |     |     |     |     | RAW F1   1-14<br>RAW F3   3   4<br>WAW F2   2   4 |
| 5 | SUBD F5,F1,F5        |    |    |    |         | F<br>s  | F<br>s  | F      | F<br>s | F  | D       | IS<br>s | IS      | E1      | E2      |     |     |     |     |     | RAW F1 I1 I5<br>WAR F5 I3 I5<br>Structural on WB  |
| 6 | SUBI R2,R2,4         |    |    |    |         |         |         |        |        |    | F       | D       | IS<br>s | IS      | E1      |     |     |     |     |     | -WAR R2 I1-I6-                                    |
| 7 | BNEZ R2, LOOP        |    |    |    |         |         |         |        |        |    |         | F       | D       | IS<br>s | IS<br>s |     |     |     |     |     | RAW R2 I6-I7                                      |
| 8 | (New<br>Instruction) |    |    |    |         |         |         |        |        |    |         |         | F<br>s  | F<br>s  | F<br>s  |     |     |     |     |     | CNTRL                                             |



CC 15

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   |                      |    |    |    |         |         |         | _         |        |    |         |         |         |         |         |         |     |     |     |     |                                                  |
|---|----------------------|----|----|----|---------|---------|---------|-----------|--------|----|---------|---------|---------|---------|---------|---------|-----|-----|-----|-----|--------------------------------------------------|
|   | Instruction          | C1 | C2 | C3 | C4      | C5      | C6      | <b>C7</b> | C8     | C9 | C10     | C11     | C12     | C13     | C14     | C15     | C16 | C17 | C18 | C19 | Notes                                            |
| 1 | LOOP:<br>LD F1,0(R2) | F  | D  | IS | E1      | E2      | W       |           |        |    |         |         |         |         |         |         |     |     |     |     |                                                  |
| 2 | MULTD F2,F1,F1       |    | F  | D  | IS<br>s | IS<br>s | IS      | E1        | E2     | E3 | W       |         |         |         |         |         |     |     |     |     | -RAW F1  1- 2-                                   |
| 3 | ADDD F3,F1,F5        |    |    | F  | D       | IS<br>s | IS<br>s | IS        | E1     | E2 | E2<br>s | w       |         |         |         |         |     |     |     |     | RAW F1 I1-I3<br>Structural on WB                 |
| 4 | MULTD F2,F3,F1       |    |    |    | F       | D<br>s  | D<br>s  | D<br>s    | D<br>s | D  | IS<br>s | IS      | E1      | E2      | E3      | W       |     |     |     |     | RAW F1 I1-I4<br>RAW F3 I3 I4<br>WAW F2 I2 I4     |
| 5 | SUBD F5,F1,F5        |    |    |    |         | F<br>s  | F<br>s  | F<br>s    | F<br>s | F  | D       | IS<br>s | IS      | E1      | E2      | E2<br>s |     |     |     |     | RAW F1 I1 I5<br>WAR F5 I3 I5<br>Structural on WB |
| 6 | SUBI R2,R2,4         |    |    |    |         |         |         |           |        |    | F       | D       | IS<br>s | IS      | E1      | E1<br>s |     |     |     |     | _WAR R2 I1-I6_<br>Structural on WB               |
| 7 | BNEZ R2, LOOP        |    |    |    |         |         |         |           |        |    |         | F       | D       | IS<br>s | IS<br>s | IS<br>s |     |     |     |     | RAW R2 I6-I7                                     |
| 8 | (New<br>Instruction) |    |    |    |         |         |         |           |        |    |         |         | F<br>s  | F<br>s  | F<br>s  | F<br>s  |     |     |     |     | CNTRL                                            |



CC 16

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   | Instruction          | C1 | C2 | C3 | C4      | C5      | C6      | C7     | C8     | C9 | C10     | C11     | C12    | C13     | C14     | C15     | C16     | C17 | C18 | C19 | Notes                                                |
|---|----------------------|----|----|----|---------|---------|---------|--------|--------|----|---------|---------|--------|---------|---------|---------|---------|-----|-----|-----|------------------------------------------------------|
| 1 | LOOP:<br>LD F1,0(R2) | F  | D  | IS | E1      | E2      | W       |        |        |    |         |         |        |         |         |         |         |     |     |     |                                                      |
| 2 | MULTD F2,F1,F1       |    | F  | D  | IS<br>s | IS<br>s | IS      | E1     | E2     | E3 | W       |         |        |         |         |         |         |     |     |     | -RAWF1-H1-I2-                                        |
| 3 | ADDD F3,F1,F5        |    |    | F  | D       | IS<br>s | IS<br>s | IS     | E1     | E2 | E2<br>s | W       |        |         |         |         |         |     |     |     | RAW F1 I1 I3 Structural on WB                        |
| 4 | MULTD F2,F3,F1       |    |    |    | F       | D<br>s  | D<br>s  | D<br>s | D<br>s | D  | IS<br>s | IS      | E1     | E2      | E3      | W       |         |     |     |     | RAW F1   11-14<br>RAW F3   3   14<br>WAW F2   2   14 |
| 5 | SUBD F5,F1,F5        |    |    |    |         | F<br>s  | F<br>s  | F      | F<br>s | F  | D       | IS<br>s | IS     | E1      | E2      | E2<br>s | W       |     |     |     | -RAW F1 I1 I5<br>-WAR F5 I3 I5<br>Structural on WB   |
| 6 | SUBI R2,R2,4         |    |    |    |         |         |         |        |        |    | F       | D       | S<br>S | S       | E1      | E1<br>s | E1<br>s |     |     |     | _WAR R2 I1-I6_<br>Structural on WB                   |
| 7 | BNEZ R2, LOOP        |    |    |    |         |         |         |        |        |    |         | F       | D      | IS<br>s | IS<br>s | IS<br>s | IS<br>s |     |     |     | RAW R2 I6-I7                                         |
| 8 | (New<br>Instruction) |    |    |    |         |         |         |        |        |    |         |         | F<br>s | F<br>s  | F<br>s  | F<br>s  | F<br>s  |     |     |     | CNTRL                                                |



CC 17

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   | Instruction          | C1 | C2 | C3 | C4      | C5      | C6      | C7     | C8     | C9 | C10     | C11     | C12     | C13     | C14     | C15     | C16     | C17    | C18 | C19 | Notes                                                  |
|---|----------------------|----|----|----|---------|---------|---------|--------|--------|----|---------|---------|---------|---------|---------|---------|---------|--------|-----|-----|--------------------------------------------------------|
| 1 | LOOP:<br>LD F1,0(R2) | F  | D  | IS | E1      | E2      | W       |        |        |    |         |         |         |         |         |         |         |        |     |     |                                                        |
| 2 | MULTD F2,F1,F1       |    | F  | D  | IS<br>s | IS<br>s | IS      | E1     | E2     | E3 | W       |         |         |         |         |         |         |        |     |     | -RAWF1-I1-I2-                                          |
| 3 | ADDD F3,F1,F5        |    |    | F  | D       | IS<br>s | IS<br>s | IS     | E1     | E2 | E2<br>s | W       |         |         |         |         |         |        |     |     | RAW F1-I1-I3-<br>Structural on WB                      |
| 4 | MULTD F2,F3,F1       |    |    |    | F       | D<br>s  | D<br>s  | D<br>s | D<br>s | D  | IS<br>s | IS      | E1      | E2      | E3      | W       |         |        |     |     | RAW F1   1-14<br>RAW F3   3   4<br>WAW F2   2   4      |
| 5 | SUBD F5,F1,F5        |    |    |    |         | F<br>s  | F<br>s  | F      | F<br>s | F  | D       | IS<br>s | IS      | E1      | E2      | E2<br>s | W       |        |     |     | -RAW F1  1  5 -<br>-WAR F5  3  5 -<br>Structural on WB |
| 6 | SUBI R2,R2,4         |    |    |    |         |         |         |        |        |    | F       | D       | IS<br>s | IS      | E1      | E1<br>s | E1<br>s | W      |     |     | _WAR R2 I1-I6_<br>Structural on WB                     |
| 7 | BNEZ R2, LOOP        |    |    |    |         |         |         |        |        |    |         | F       | D       | IS<br>s | IS<br>s | IS<br>s | IS<br>s | IS     |     |     | -RAW R2 l6 l7                                          |
| 8 | (New<br>Instruction) |    |    |    |         |         |         |        |        |    |         |         | F<br>s  | F<br>s  | F<br>s  | F<br>s  | F<br>s  | F<br>s |     |     | CNTRL                                                  |





CC 19

ALU OP: 1 cycle MEM OP: 2 cycles FP ADD: 2 cycles

FP MULT: 3

|   | Instruction          | C1 | C2 | C3 | C4      | C5      | C6      | C7     | C8     | C9 | C10     | C11     | C12     | C13     | C14     | C15     | C16     | C17    | C18 | C19 | Notes                                                |
|---|----------------------|----|----|----|---------|---------|---------|--------|--------|----|---------|---------|---------|---------|---------|---------|---------|--------|-----|-----|------------------------------------------------------|
| 1 | LOOP:<br>LD F1,0(R2) | F  | D  | IS | E1      | E2      | W       |        |        |    |         |         |         |         |         |         |         |        |     |     |                                                      |
| 2 | MULTD F2,F1,F1       |    | F  | D  | IS<br>s | IS<br>s | IS      | E1     | E2     | E3 | W       |         |         |         |         |         |         |        |     |     | -RAWF1-H1-I2-                                        |
| 3 | ADDD F3,F1,F5        |    |    | F  | D       | IS<br>s | IS<br>s | IS     | E1     | E2 | E2<br>s | W       |         |         |         |         |         |        |     |     | RAW F1 I1 I3 Structural on WB                        |
| 4 | MULTD F2,F3,F1       |    |    |    | F       | D<br>s  | D<br>s  | D<br>s | D<br>s | D  | IS<br>s | IS      | E1      | E2      | E3      | W       |         |        |     |     | RAW F1   1-14<br>RAW F3   3   4<br>WAW F2   2   4    |
| 5 | SUBD F5,F1,F5        |    |    |    |         | F<br>s  | F<br>s  | F      | F<br>s | F  | D       | IS<br>s | IS      | E1      | E2      | E2<br>s | W       |        |     |     | -RAW F1 I1 I5-<br>-WAR F5 I3 I5-<br>Structural on WB |
| 6 | SUBI R2,R2,4         |    |    |    |         |         |         |        |        |    | F       | D       | IS<br>s | IS      | E1      | E1<br>s | E1<br>s | W      |     |     | _WAR R2 I1-I6_<br>Structural on WB                   |
| 7 | BNEZ R2, LOOP        |    |    |    |         |         |         |        |        |    |         | F       | D       | IS<br>s | IS<br>s | IS<br>s | IS<br>s | IS     | E1  | W   | -RAW R2 l6 l7                                        |
| 8 | (New<br>Instruction) |    |    |    |         |         |         |        |        |    |         |         | F<br>s  | F       | F<br>s  | F<br>s  | F<br>s  | F<br>s | F   | D   | CNTRL                                                |





## Recall: Pipeline performance

Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls

Ideal pipeline CPI: measure of the maximum performance attainable by the implementation

**Structural hazards**: HW cannot support this combination of instructions

Data hazards: Instruction depends on result of prior instruction still in the pipeline

Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches, jumps, exceptions)





#### Prediction: a.k.a. deal with Control Hazard

Branch vanguard: decomposing branch functionality into prediction and resolution instructions

### The IBM z15 High Frequency Mainframe





## Dynamic Branch Predictor

Describe (the answer has to be effectively supported) a 1-BHT and a 2-BHT able to execute the following assembly code (R0 is set to 1, R1 is set to 300)

LOOP: LD F3 0 (R0)

ADDD F1 F3 F3

ADDI R1 R1 3000

LOOP2:

MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP

 The obtained result, in terms of mispredictions, is inline with theoretical characteristics of the two predictors? Please effectively support your answer.





#### A First Consideration

LOOP: LD F3 0 (R0)

ADDD F1 F3 F3

ADDI R1 R1 3000

-OOP2: MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP





LOOP: LD F3 0 (R0)

ADDD F1 F3 F3

ADDI R1 R1 3000

LOOP2: MULTD F2 F2 F3

**SUBI R1 R1 3** 

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP

R0 is set to 1 R1 is set to 300





LOOP: LD F3 0 (R0)

R0 is set to 1 R1 is set to 300

ADDD F1 F3 F3

ADDI R1 R1 3000

LOOP2: MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP

LOOP2

 $@T0 \rightarrow 3300 / 3 = 1100$ 





LOOP: LD F3 0 (R0)

R0 is set to 1 R1 is set to 300

ADDD F1 F3 F3

ADDI R1 R1 3000

LOOP2: MII

MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP

LOOP2

 $@T0 \rightarrow 3300 / 3 = 1100$ 

**LOOP** 

1 - 2 = -1 =/=  $0!!!! \rightarrow \infty$  loop





LOOP: LD F3 0 (R0)

ADDD F1 F3 F3

ADDI R1 R1 3000

LOOP2:

MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP

R0 is set to 1 R1 is set to 300

LOOP2

 $@T0 \rightarrow 3300 / 3 = 1100$ 

@Ti 3000 / 3 = 1000 iterations

**LOOP** 

1 - 2 = -1 =/=  $0!!!! \rightarrow \infty$  loop





#### 1bit - BHT

LOOP: LD F3 0 (R0)

ADDD F1 F3 F3

ADDI R1 R1 3000

LOOP2: MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP

R0 is set to 1 R1 is set to 300







#### 1bit - BHT

LOOP: LD F3 0 (R0)

ADDD F1 F3 F3

ADDI R1 R1 3000

LOOP2: MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP

R0 is set to 1 R1 is set to 300





k-bit Branch Address: Collide Not collide





#### 1bit - BHT - Not Collide

LOOP: LD F3 0 (R0)

ADDD F1 F3 F3

ADDI R1 R1 3000

LOOP2: MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP

R0 is set to 1 R1 is set to 300



Let us consider that the branch addresses do not collide

1-BHT

LOOP: T

LOOP2: T

**1-BHT** 

T NT 1-BHT

NT T NT NT

1-BHT





LOOP: LD F3 0 (R0)

ADDD F1 F3 F3

ADDI R1 R1 3000

LOOP2: MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP

LOOP2 @T0 → 3300 / 3 = 1100

L<sub>00</sub>P

@Ti 3000 / 3 = 1000 iterations

1 - 2 = -1 =/=  $0!!!! \rightarrow \infty$  loop

Let us consider that the branch addresses do not collide



**1-BHT** 

NT

T NT 1-BH1

NT

NT T 1-BH1

R0 is set to 1

R1 is set to 300

NT NT

LOOP → NO misprediction LOOP2 → single misprediction at the end of the loop @T0 + 2 misprediction x iteration begin and end of the loop





NT

LOOP: LD F3 0 (R0)

ADDD F1 F3 F3

ADDI R1 R1 3000

LOOP2: MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP

LOOP2 @T0 → 3300 / 3 = 1100

LOOP

@Ti 3000 / 3 = 1000 iterations

1 - 2 = -1 =/=  $0!!!! \rightarrow \infty$  loop

Let us consider that the branch addresses do not collide

NT

1-BHT 1-BHT 1-BHT

NT

L00P: T L00P2: T NT T

R0 is set to 1

R1 is set to 300

NT NT

**LOOP** → **NO** misprediction

LOOP2 → two misprection: beginning and end of the loop





NT

LOOP: LD F3 0 (R0)

ADDD F1 F3 F3

ADDI R1 R1 3000

LOOP2: MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP

LOOP2 @T0 → 3300 / 3 = 1100

@Ti 3000 / 3 = 1000 iterations  $1 - 2 = -1 = 0!!!! \rightarrow \infty$  loop

Let us consider that the branch addresses do not collide



NT

LOOP  $\rightarrow$  only initial misprediction LOOP2  $\rightarrow$  single misprediction at the end of the loop @T0 + 2 misprediction x iteration begin and end of the loop





R0 is set to 1

R1 is set to 300

LOOP: LD F3 0 (R0)

ADDD F1 F3 F3

ADDI R1 R1 3000

L00P2: MUITD F2 F2 F3

BNEZ R1 LOOP2

SUBI RO RO 2

BNEZ RØ LOOP

I OOP2 L<sub>00</sub>P  $@T0 \rightarrow 3300 / 3 = 1100$ 

@Ti 3000 / 3 = 1000 iterations  $1 - 2 = -1 = -1 = 0!!!! \rightarrow \infty$  loop

Let us consider that the branch addresses do not collide

LOOP: L00P2:

1-BHT

1-BHT NT

NT

1-BHT

NT

NT

R0 is set to 1

R1 is set to 300



**LOOP** → only initial misprediction

LOOP2 → two misprection: beginning and end of the loop





#### 1bit - BHT - Collision

LOOP: LD F3 0 (R0)

ADDD F1 F3 F3

ADDI R1 R1 3000

LOOP2: MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP

LOOP2  $@T0 \rightarrow 3300 / 3 = 1100$  LOOP

@Ti 3000 / 3 = 1000 iterations  $1 - 2 = -1 = -1 = 0!!!! \rightarrow \infty$  loop

Let us consider that the branch addresses do collide

R0 is set to 1 R1 is set to 300









LOOP: LD F3 0 (R0)

ADDD F1 F3 F3

ADDI R1 R1 3000

LOOP2: MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP

LOOP2  $\textcircled{mT0} \rightarrow 3300 / 3 = 1100$  LOOP

@Ti 3000 / 3 = 1000 iterations  $1 - 2 = -1 = 0!!!! \rightarrow \infty$  loop

Let us consider that the branch addresses do collide

LOOP2 → single misprediction at the end of the loop LOOP → 100% failure rate R0 is set to 1 R1 is set to 300





LOOP2 → two initial misprection, then end of the loop LOOP → 100% failure rate





LOOP: LD F3 0 (R0)

2bit - BHT

R0 is set to 1 R1 is set to 300

ADDI R1 R1 3000

LOOP2: MULTD F2 F2 F3

SUBI R1 R1 3

ADDD F1 F3 F3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP





LOOP: LD F3 0 (R0)

2bit - BHT

R0 is set to 1 R1 is set to 300

ADDD F1 F3 F3

ADDI R1 R1 3000

LOOP2: MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP

LOOP2 @T0 → 3300 / 3 = 1100

**@Ti** 3000 / 3 = 1000 iterations 1 - 2 = -1 =/= 0!!!! →  $\infty$  loop







LOOP: LD F3 0 (R0) 2bit - BHT

R0 is set to 1 R1 is set to 300

ADDI R1 R1 3000

ADDD F1 F3 F3

LOOP2: MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP

LOOP2  $@T0 \rightarrow 3300 / 3 = 1100$ 

L<sub>00</sub>P

@Ti 3000 / 3 = 1000 iterations

1 - 2 = -1 =/=  $0!!!! \rightarrow \infty$  loop



Let us consider that the branch addresses do collide

**2-BHT** 



**2-BHT** 



**2-BHT** 









LOOP: LD F3 0 (R0)

ADDD F1 F3 F3

ADDI R1 R1 3000

LOOP2: MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI RØ RØ 2

BNEZ RØ LOOP

LOOP2

 $@T0 \rightarrow 3300 / 3 = 1100$ 

L<sub>00</sub>P

@Ti 3000 / 3 = 1000 iterations

1 - 2 = -1 =/= 0!!!! → <sup>∞</sup> loop

2bit - BHT

R0 is set to 1 R1 is set to 300



Let us consider that the branch addresses do collide

T<sub>strong</sub>

**2-BHT** 

2-BHT



NT<sub>strong</sub>

**2-BHT** 



47

**2-BHT** 

LOOP2 → single misprediction at the end of the loop LOOP → 100% success rate

LOOP2 → at most 3 initial misprediction (2 at the begin and one at the end), then single misprediction





POLITECNI LOOP → 100% success rate MILANO 1603

LOOP: LD F3 0 (R0)

ADDD F1 F3 F3

ADDI R1 R1 3000

LOOP2: MULTD F2 F2 F3

SUBI R1 R1 3

BNEZ R1 LOOP2

SUBI R0 R0 2

BNEZ RØ LOOP

LOOP2

 $@T0 \rightarrow 3300 / 3 = 1100$ 

L<sub>00</sub>P

@Ti 3000 / 3 = 1000 iterations

1 - 2 = -1 =/= 0!!!! → <sup>∞</sup> loop

2bit - BHT

R0 is set to 1 R1 is set to 300



Let us consider that the branch addresses do not collide

**2-BHT** 

LOOP:

 $\mathsf{T}_{\mathsf{strong}}$   $\mathsf{T}_{\mathsf{strong}}$ 

**2-BHT** 

T NT **2-BHT** 

NT

2-BHT

NT NT







Let us consider that the branch addresses do not collide







#### SUMMARY

#### Assumption: NO collision

#### **WORST CASES**

1-BHT

LOOP: LOOP2:

NT NT 2-BHT

LOOP:

NT<sub>strong</sub>

2\*∞ misprediction for LOOP21 misprediction for LOOP.

2+1@T0 +1\*∞ misprediction for LOOP2 2@T0 misprediction for LOOP.

#### **BEST CASES**

**1-BHT** 

LOOP:



2-BHT

LOOP:



1@T0 + 2\*∞ misprediction for LOOP2 0 for LOOP

1\*∞ misprediction for LOOP2 0 for LOOP





#### SUMMARY

#### Assumption: NO collision

**WORST CASES** 



2\*∞ misprediction for LOOP21 misprediction for LOOP.

LOOP:





BDCASES

1BHT-BHT

LOOP: LOOP2:



1@T0 + 2\*∞ misprediction for LOOP2

0 for LOOP

1\*∞ misprediction for LOOP2 0 for LOOP







## Recall: The ILP Architecture Journey

Steps towards exploiting more ILP





Sequential (non pipelined) \_\_\_\_ IDEAL CPI > 1







## Recall: The ILP Architecture Journey

Steps towards exploiting more ILP









## Recall: The ILP Architecture Journey

Steps towards exploiting more ILP









#### Problem:

data dependences that cannot be hidden with bypassing or forwarding cause hardware stalls of the pipeline





#### Problem:

data dependences that cannot be hidden with bypassing or forwarding cause hardware stalls of the pipeline

Solution: allow instructions behind a stall to proceed

HW rearranges the instruction execution to reduce stalls



#### Problem:

data dependences that cannot be hidden with bypassing or forwarding cause hardware stalls of the pipeline

Solution: allow instructions behind a stall to proceed

HW rearranges the instruction execution to reduce stalls

Enables out-of-order execution and completion (commit)

Out-of order execution introduces possibility of WAR, WAW data hazards.





#### Problem:

data dependences that cannot be hidden with bypassing or forwarding cause hardware stalls of the pipeline

Solution: allow instructions behind a stall to proceed

HW rearranges the instruction execution to reduce stalls

Enables out-of-order execution and completion (commit)

Out-of order execution introduces possibility of WAR, WAW data hazards.

First implemented in CDC6600 (1963)





## Exe 1 Scoreboard



Parallel operation in the control data 6600





# Recall: the Scoreboard pipeline

| ISSUE                                  | READ OPERAND   | EXE COMPLETE                     | WB                                                                                                       |
|----------------------------------------|----------------|----------------------------------|----------------------------------------------------------------------------------------------------------|
| Decode<br>instruction;                 | Read operands; | Operate on operands;             | Finish exec;                                                                                             |
| Structural FUs<br>check;<br>WAW checks | RAW check;     | Notify Scoreboard on completion; | WAR &Struct check<br>(FUs will hold results);<br>Can overlap<br>issue/read&write 4<br>Structural Hazard; |





### Exe 1 Scoreboard: the Code

```
I1: LD F6 32+ R2
```

I2: ADDD F2 F6 F4

I3: MULTD F0 F4 F2

I4: SUBD F12 F2 F6





I1: LD F6 32+ R2

I2: ADDD F2 F6 F4

I3: MULTD F0 F4 F2

I4: SUBD F12 F2 F6





#### **RAW F6 I1-I2**

I1: LD F6 32+ R2

12: ADDD F2 F6 F4

I3: MULTD F0 F4 F2

I4: SUBD F12 F2 F6





**RAW F6 I1-I2** 

**RAW F6 I1-I4** 

I1: LD(F6)32+ R2

12: ADDD F2 F6 F4

I3: MULTD F0 F4 F2

I4: SUBD F12 F2 F6





I1: LD(F6)32+ R2

12: ADDD F2 F6 F4

I3: MULTD F0 F4 F2

I4: SUBD F12 F2(F6)

I5: ADDD F0 F12 F2

**RAW F6 I1-I2** 

**RAW F6 I1-I4** 

**RAW F2 I2-I3** 





I1: LD(F6)32+ R2

I2: ADDD E20F6 F4

I3: MULTD F0 F4 F2

I4: SUBD F12(F2)F6

I5: ADDD F0 F12 F2

**RAW F6 I1-I2** 

**RAW F6 I1-I4** 

**RAW F2 I2-I3** 

RAW F2 I2-I4





I1: LD(F6)32+ R2

I2: ADDD **F2** F6 F4

I3: MULTD F0 F4 F2

I4: SUBD F12(F2)F6

I5: ADDD F0 F12 F2

**RAW F6 I1-I2** 

**RAW F6 I1-I4** 

**RAW F2 I2-I3** 

RAW F2 I2-I4

RAW F2 I2-I5





I1: LD(F6)32+ R2

I2: ADDD **F2**F6 F4

I3: MULTD F0 F4 F2

I4: SUBD F12 F2 F6

I5: ADDD F0 F120 F2

**RAW F6 I1-I2** 

**RAW F6 I1-I4** 

**RAW F2 I2-I3** 

RAW F2 I2-I4

RAW F2 I2-I5

RAW F12 I4-I5





I1: LD(F6)32+ R2

I2: ADDD **F2**F6 F4

I3: MULTO F0 F4 F2

I4: SUBD (F12) F2 F6

15: ADDD F0 F120 F2

**RAW F6 I1-I2** 

**RAW F6 I1-I4** 

RAW F2 I2-I3

RAW F2 I2-I4

RAW F2 I2-I5

RAW F12 I4-I5

**WAW FO 13-15** 





# Exe 1.2 Scoreboard: ∃ a configuration?

|                    | Issue | Read Op | Exec Co. | Write R. |
|--------------------|-------|---------|----------|----------|
|                    |       |         |          |          |
| I1: LD F6 32+ R2   | 1     | 2       | 7        | 8        |
|                    |       |         |          |          |
| 12: ADDD F2 F6 F4  | 2     | 9       | 11       | 12       |
|                    |       |         |          |          |
| 13: MULTD F0 F4 F2 | 4     | 13      | 43       | 44       |
|                    |       |         |          |          |
| 14: SUBD F12 F2 F6 | 3     | 9       | 11       | 12       |
|                    |       |         |          |          |
| I5: ADDD F0 F12 F2 | 13    | 17      | 19       | 20       |

- Is there a "configuration" that can respect the shown execution?
- How many units? Which kind? What latency?





# Exe 1.2 Scoreboard: ∃ a configuration?

|                    | Issue | Read Op | FAC C Co | . Write R. |
|--------------------|-------|---------|----------|------------|
|                    |       |         |          |            |
| I1: LD F6 32+ R2   | 1     | 2       | 7        | 8          |
| I2: ADDD F2 F6 F4  | 2     | 9       | 11       | 12         |
| I3: MULTD F0 F4 F2 | 4     | 13      | 43       | 44         |
| I4: SUBD F12 F2 F6 | 3     | 9       | 11       | 12         |
| I5: ADDD F0 F12 F2 | 13    | 17      | 19       | 20         |
|                    |       |         | V        |            |

- Is there a "configuration" that can respect the shown execution?
- How many units? Which kind? What latency?





## Exe 1.2 Scoreboard: ∃ a configuration?

|                    | Issue | Read Op | FAC CO | . //ri\ a R. |
|--------------------|-------|---------|--------|--------------|
|                    |       |         |        |              |
| I1: LD F6 32+ R2   | 1     | 2       | 7      | 8            |
| I2: ADDD F2 F6 F4  | 2     | 9       | 11     | 12           |
| I3: MULTD F0 F4 F2 | 4     | 13      | 43     | 44           |
| I4: SUBD F12 F2 F6 | 3     | 9       | 11     | 12           |
| I5: ADDD F0 F12 F2 | 13    | 17      | 19     | 20           |
|                    |       |         | V      | V            |

- Is there a "configuration" that can respect the shown execution?
- How many units? Which kind? What latency?





## Exe 1.2 Scoreboard: ∃ a configuration?

|                    | Issue | Read Op | FAC Co. | Vri\ ₹ R. |
|--------------------|-------|---------|---------|-----------|
|                    |       |         |         |           |
| I1: LD F6 32+ R2   | 1     | 2       | 7       | 8         |
| I2: ADDD F2 F6 F4  | 2     | 9       | 11      | 12        |
| I3: MULTD F0 F4 F2 | 4     | 13      | 43      | 44        |
| I4: SUBD F12 F2 F6 | 3     | 9       | 11      | 12        |
| I5: ADDD F0 F12 F2 | 13    | 17      | 19      | 20        |
|                    |       |         | V       | V         |

- Is there a "configuration" that can respect the shown execution?
- How many units? Which kind? What latency?





## Exe 1.2 Scoreboard: 3 a configuration?

|                    | Issue | Read Op | FAC CO. | Vri\ ∍ R. |
|--------------------|-------|---------|---------|-----------|
|                    |       |         |         |           |
| I1: LD F6 32+ R2   | 1     | 2       | 7       | 8         |
| I2: ADDD F2 F6 F4  | 2     | 9       | 11      | 12        |
| I3: MULTD F0 F4 F2 | 4     | 13      | 43      | 44        |
| I4: SUBD F12 F2 F6 | 3     | 9       | 11      | 12        |
| I5: ADDD F0 F12 F2 | 13    | 17      | 19      | 20        |
|                    |       |         |         | V         |

- Is there a "configuration" that can respect the shown execution?
- How many units? Which kind? What latency?





### Exe 1.2 Scoreboard: ∃ a configuration?

|                    | iss le |  | Read Op | FACC | Co. | Vri\ > | R. |
|--------------------|--------|--|---------|------|-----|--------|----|
| I1: LD F6 32+ R2   | 1      |  | 2       | 7    |     | 8      |    |
| I2: ADDD F2 F6 F4  | 2      |  | 9       | 11   |     | 12     |    |
| I3: MULTD F0 F4 F2 | 4      |  | 13      | 43   |     | 44     |    |
| I4: SUBD F12 F2 F6 | 3      |  | 9       | 11   |     | 12     |    |
| I5: ADDD F0 F12 F2 | 13     |  | 17      | 19   |     | 20     |    |
|                    | V      |  |         |      |     | V      |    |

- Is there a "configuration" that can respect the shown execution?
- How many units? Which kind? What latency?





### Exe 1.2 Scoreboard: ∃ a configuration?

|                    | iss te |  | Read Op | FALC | Co. | √/ri\ ∍ R. |  |
|--------------------|--------|--|---------|------|-----|------------|--|
| I1: LD F6 32+ R2   | 1      |  | 2       | 7    |     | 8          |  |
| I2: ADDD F2 F6 F4  | 2      |  | 9       | 11   |     | 12         |  |
| I3: MULTD F0 F4 F2 | 4      |  | 13      | 43   |     | 44         |  |
| I4: SUBD F12 F2 F6 | 3      |  | 9       | 11   |     | 12         |  |
| I5: ADDD F0 F12 F2 | 13     |  | 17      | 19   |     | 20         |  |
|                    | V      |  |         |      |     | V          |  |

- Is there a "configuration" that can respect the shown execution?
- How many units? Which kind? What latency?





|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards | Unit |
|----|----------------|-------|-----------------|-----------------|----|---------|------|
| 11 | LD F6 32+ R2   |       |                 |                 |    |         |      |
| 12 | ADDD F2 F6 F4  |       |                 |                 |    |         |      |
| 13 | MULTD F0 F4 F2 |       |                 |                 |    |         |      |
| 14 | SUBD F12 F2 F6 |       |                 |                 |    |         |      |
| 15 | ADDD F0 F12 F2 |       |                 |                 |    |         |      |

If the previous table was not correct, please, write the right one and specify the number, kind and latency for each unit.





|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards | Unit |
|----|----------------|-------|-----------------|-----------------|----|---------|------|
| 11 | LD F6 32+ R2   |       |                 |                 |    |         |      |
| 12 | ADDD F2 F6 F4  |       |                 |                 |    |         |      |
| 13 | MULTD F0 F4 F2 |       |                 |                 |    |         |      |
| 14 | SUBD F12 F2 F6 |       |                 |                 |    |         |      |
| 15 | ADDD F0 F12 F2 |       |                 |                 |    |         |      |

If the previous table was not correct, please, write the right one and specify the number,

kind and latency for each unit.

4 FPALU 3 cc latency, single write port for the pool

1 MEM 2 cc latency



**RAW F6 I1-I2 RAW F6 I1-I4** 





|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards | Unit |
|----|----------------|-------|-----------------|-----------------|----|---------|------|
| 11 | LD F6 32+ R2   | 1     |                 |                 |    |         | MU   |
| 12 | ADDD F2 F6 F4  |       |                 |                 |    |         |      |
| 13 | MULTD F0 F4 F2 |       |                 |                 |    |         |      |
| 14 | SUBD F12 F2 F6 |       |                 |                 |    |         |      |
| 15 | ADDD F0 F12 F2 |       |                 |                 |    |         |      |

If the previous table was not correct, please, write the right one and specify the number, **RAW F6 I1-I2** 

kind and latency for each unit.

4 FPALU 3 cc latency, single write port for the pool

1 MEM 2 cc latency



**RAW F6 I1-I4** 





|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards | Unit |
|----|----------------|-------|-----------------|-----------------|----|---------|------|
| 11 | LD F6 32+ R2   | 1     | 2               |                 |    |         | MU   |
| 12 | ADDD F2 F6 F4  | 2     |                 |                 |    |         | FPU1 |
| 13 | MULTD F0 F4 F2 |       |                 |                 |    |         |      |
| 14 | SUBD F12 F2 F6 |       |                 |                 |    |         |      |
| 15 | ADDD F0 F12 F2 |       |                 |                 |    |         |      |

If the previous table was not correct, please, write the right one and specify the number,

kind and latency for each unit.

4 FPALU 3 cc latency, single write port for the pool

1 MEM 2 cc latency



**RAW F6 I1-I2 RAW F6 I1-I4** 





|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards | Unit |
|----|----------------|-------|-----------------|-----------------|----|---------|------|
| 11 | LD F6 32+ R2   | 1     | 2               |                 |    |         | MU   |
| 12 | ADDD F2 F6 F4  | 2     |                 |                 |    | RAW F6  | FPU1 |
| 13 | MULTD F0 F4 F2 | 3     |                 |                 |    |         | FPU2 |
| 14 | SUBD F12 F2 F6 |       |                 |                 |    |         |      |
| 15 | ADDD F0 F12 F2 |       |                 |                 |    |         |      |

If the previous table was not correct, please, write the right one and specify the number, kind and latency for each unit.

RAW F6 I1-I2

4 FPALU 3 cc latency, <u>single write</u> port for the pool 1 MEM 2 cc latency



RAW F2 I2-I3 RAW F2 I2-I4 RAW F2 I2-I5 RAW F12 I4-I5

**WAW FO 13-15** 



|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards | Unit |
|----|----------------|-------|-----------------|-----------------|----|---------|------|
| I1 | LD F6 32+ R2   | 1     | 2               | 4               |    |         | MU   |
| 12 | ADDD F2 F6 F4  | 2     |                 |                 |    | RAW F6  | FPU1 |
| 13 | MULTD F0 F4 F2 | 3     |                 |                 |    | RAW F2  | FPU2 |
| 14 | SUBD F12 F2 F6 | 4     |                 |                 |    |         | FPU3 |
| 15 | ADDD F0 F12 F2 |       |                 |                 |    |         |      |

If the previous table was not correct, please, write the right one and specify the number, kind and latency for each unit.

RAW F6 I1-I2

4 FPALU 3 cc latency, <u>single write</u> port for the pool 1 MEM 2 cc latency



PAW F2 I2-I3 RAW F2 I2-I4 RAW F2 I2-I5 RAW F12 I4-I5

**WAW FO I3-I5** 

|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards | Unit |
|----|----------------|-------|-----------------|-----------------|----|---------|------|
| I1 | LD F6 32+ R2   | 1     | 2               | 4               | 5  |         | MU   |
| 12 | ADDD F2 F6 F4  | 2     |                 |                 |    | RAW F6  | FPU1 |
| 13 | MULTD F0 F4 F2 | 3     |                 |                 |    | RAW F2  | FPU2 |
| 14 | SUBD F12 F2 F6 | 4     |                 |                 |    |         | FPU3 |
| 15 | ADDD F0 F12 F2 |       |                 |                 |    | WAW F0  |      |

If the previous table was not correct, please, write the right one and specify the number,

kind and latency for each unit.

4 FPALU 3 cc latency, <u>single write</u> port for the pool

1 MEM 2 cc latency





**RAW F6 I1-I4** 

**RAW F2 I2-I3** 

RAW F2 I2-I4

**RAW F2 I2-I5** 

PAM 140 14 10

**WAW FO I3-I5** 



|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards | Unit |
|----|----------------|-------|-----------------|-----------------|----|---------|------|
| I1 | LD F6 32+ R2   | 1     | 2               | 4               | 5  |         | MU   |
| 12 | ADDD F2 F6 F4  | 2     |                 |                 |    | RAW F6  | FPU1 |
| 13 | MULTD F0 F4 F2 | 3     |                 |                 |    | RAW F2  | FPU2 |
| 14 | SUBD F12 F2 F6 | 4     |                 |                 |    | RAW F2  | FPU3 |
| 15 | ADDD F0 F12 F2 |       |                 |                 |    | WAW F0  |      |

If the previous table was not correct, please, write the right one and specify the number, kind and latency for each unit.

RAW F6 I1-I2

4 FPALU 3 cc latency, <u>single write</u> port for the pool 1 MEM 2 cc latency





RAW F2 I2-I3 RAW F2 I2-I4 RAW F2 I2-I5 RAW F12 I4-I5 WAW F0 I3-I5

RAW F6 I1-I4

|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards | Unit |
|----|----------------|-------|-----------------|-----------------|----|---------|------|
| 11 | LD F6 32+ R2   | 1     | 2               | 4               | 5  |         | MU   |
| 12 | ADDD F2 F6 F4  | 2     | 6               |                 |    | RAW F6  | FPU1 |
| 13 | MULTD F0 F4 F2 | 3     |                 |                 |    | RAW F2  | FPU2 |
| 14 | SUBD F12 F2 F6 | 4     |                 |                 |    | RAW F2  | FPU3 |
| 15 | ADDD F0 F12 F2 |       |                 |                 |    | WAW F0  |      |

4 FPALU 3 cc latency, <u>single write</u> port for the pool 1 MEM 2 cc latency

RAW F6 I1-I4

RAW F2 I2-I3

RAW F2 I2-I4

RAW F2 I2-I5

RAW F12 I4-I5





|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards | Unit |
|----|----------------|-------|-----------------|-----------------|----|---------|------|
| I1 | LD F6 32+ R2   | 1     | 2               | 4               | 5  |         | MU   |
| 12 | ADDD F2 F6 F4  | 2     | 6               | 9               |    | RAW F6  | FPU1 |
| 13 | MULTD F0 F4 F2 | 3     |                 |                 |    | RAW F2  | FPU2 |
| 14 | SUBD F12 F2 F6 | 4     |                 |                 |    | RAW F2  | FPU3 |
| 15 | ADDD F0 F12 F2 |       |                 |                 |    | WAW F0  |      |

4 FPALU 3 cc latency, single write port for the pool





|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards | Unit |
|----|----------------|-------|-----------------|-----------------|----|---------|------|
| 11 | LD F6 32+ R2   | 1     | 2               | 4               | 5  |         | MU   |
| 12 | ADDD F2 F6 F4  | 2     | 6               | 9               | 10 | RAW F6  | FPU1 |
| 13 | MULTD F0 F4 F2 | 3     |                 |                 |    | RAW F2  | FPU2 |
| 14 | SUBD F12 F2 F6 | 4     |                 |                 |    | RAW F2  | FPU3 |
| 15 | ADDD F0 F12 F2 |       |                 |                 |    | WAW F0  |      |

If the previous table was not correct, please, write the right one and specify the number,

kind and latency for each unit.

4 FPALU 3 cc latency, single write port for the pool

1 MEM 2 cc latency



**RAW F12 I4-I5 WAW FO 13-15** 





|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards | Unit |
|----|----------------|-------|-----------------|-----------------|----|---------|------|
| 11 | LD F6 32+ R2   | 1     | 2               | 4               | 5  |         | MU   |
| 12 | ADDD F2 F6 F4  | 2     | 6               | 9               | 10 | RAW F6  | FPU1 |
| 13 | MULTD F0 F4 F2 | 3     | 11              |                 |    | RAW F2  | FPU2 |
| 14 | SUBD F12 F2 F6 | 4     | 11              |                 |    | RAW F2  | FPU3 |
| 15 | ADDD F0 F12 F2 |       |                 |                 |    | WAW F0  |      |

If the previous table was not correct, please, write the right one and specify the number,

kind and latency for each unit.

4 FPALU 3 cc latency, single write port for the pool

1 MEM 2 cc latency







RAW F12 I4-I5

**WAW F0 I3-I5** 

|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards | Unit |
|----|----------------|-------|-----------------|-----------------|----|---------|------|
| 11 | LD F6 32+ R2   | 1     | 2               | 4               | 5  |         | MU   |
| 12 | ADDD F2 F6 F4  | 2     | 6               | 9               | 10 | RAW F6  | FPU1 |
| 13 | MULTD F0 F4 F2 | 3     | 11              | 14              |    | RAW F2  | FPU2 |
| 14 | SUBD F12 F2 F6 | 4     | 11              | 14              |    | RAW F2  | FPU3 |
| 15 | ADDD F0 F12 F2 |       |                 |                 |    | WAW F0  |      |

If the previous table was not correct, please, write the right one and specify the number,

kind and latency for each unit.

4 FPALU 3 cc latency, single write port for the pool

1 MEM 2 cc latency







RAW F12 I4-I5 WAW F0 I3-I5

|          | Instruction                                    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards | Unit |
|----------|------------------------------------------------|-------|-----------------|-----------------|----|---------|------|
| I1       | LD F6 32+ R2                                   | 1     | 2               | 4               | 5  |         | MU   |
| 12       | ADDD F2 F6 F4                                  | 2     | 6               | 9               | 10 | RAW F6  | FPU1 |
| 13<br>14 | MULTO F0 F4 F2  SUND F12 F2 F6  ADDD F0 F12 F2 |       | 11              | 14              |    | RAW FA  | FVL3 |

If the previous table was not correct, please, write the right one and specify the number,

kind and latency for each unit.

4 FPALU 3 cc latency, single write port for the pool

1 MEM 2 cc latency



DAW F2 12-15

RAW F12 I4-I5

**WAW F0 I3-I5** 







|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards               | Unit |
|----|----------------|-------|-----------------|-----------------|----|-----------------------|------|
| 11 | LD F6 32+ R2   | 1     | 2               | 4               | 5  |                       | MU   |
| 12 | ADDD F2 F6 F4  | 2     | 6               | 9               | 10 | RAW F6                | FPU1 |
| 13 | MULTD F0 F4 F2 | 3     | 11              | 14              | 15 | RAW F2                | FPU2 |
| 14 | SUBD F12 F2 F6 | 4     | 11              | 14              |    | RAW F2 +<br>Struct RF | FPU3 |
| 15 | ADDD F0 F12 F2 |       |                 |                 |    | WAW F0                |      |

If the previous table was not correct, please, write the right one and specify the number,

kind and latency for each unit.

4 FPALU 3 cc latency, single write port for the pool







|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards               | Unit |
|----|----------------|-------|-----------------|-----------------|----|-----------------------|------|
| 11 | LD F6 32+ R2   | 1     | 2               | 4               | 5  |                       | MU   |
| 12 | ADDD F2 F6 F4  | 2     | 6               | 9               | 10 | RAW F6                | FPU1 |
| 13 | MULTD F0 F4 F2 | 3     | 11              | 14              | 15 | RAW F2                | FPU2 |
| 14 | SUBD F12 F2 F6 | 4     | 11              | 14              | 16 | RAW F2 +<br>Struct RF | FPU3 |
| 15 | ADDD F0 F12 F2 | 16    |                 |                 |    | WAW F0                | FPU4 |

If the previous table was not correct, please, write the right one and specify the number,

kind and latency for each unit.

4 FPALU 3 cc latency, single write port for the pool







|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards               | Unit |
|----|----------------|-------|-----------------|-----------------|----|-----------------------|------|
| I1 | LD F6 32+ R2   | 1     | 2               | 4               | 5  |                       | MU   |
| 12 | ADDD F2 F6 F4  | 2     | 6               | 9               | 10 | RAW F6                | FPU1 |
| 13 | MULTD F0 F4 F2 | 3     | 11              | 14              | 15 | RAW F2                | FPU2 |
| 14 | SUBD F12 F2 F6 | 4     | 11              | 14              | 16 | RAW F2 +<br>Struct RF | FPU3 |
| 15 | ADDD F0 F12 F2 | 16    | 17              |                 |    | WAW F0                | FPU4 |

If the previous table was not correct, please, write the right one and specify the number,

kind and latency for each unit.

4 FPALU 3 cc latency, single write port for the pool







|    | Instruction    | ISSUE | READ<br>OPERAND | EXE<br>COMPLETE | WB | Hazards               | Unit |
|----|----------------|-------|-----------------|-----------------|----|-----------------------|------|
| 11 | LD F6 32+ R2   | 1     | 2               | 4               | 5  |                       | MU   |
| 12 | ADDD F2 F6 F4  | 2     | 6               | 9               | 10 | RAW F6                | FPU1 |
| 13 | MULTD F0 F4 F2 | 3     | 11              | 14              | 15 | RAW F2                | FPU2 |
| 14 | SUBD F12 F2 F6 | 4     | 11              | 14              | 16 | RAW F2 +<br>Struct RF | FPU3 |
| 15 | ADDD F0 F12 F2 | 16    | 17              | 20              | 21 | WAW F0                | FPU4 |

If the previous table was not correct, please, write the right one and specify the number,

kind and latency for each unit.

4 FPALU 3 cc latency, <u>single write</u> port for the pool









### Exe Tomasulo



## Recall: the Tomasulo pipeline

| ISSUE                                                                                 | EXECUTION                                                         | WRITE                                                             |
|---------------------------------------------------------------------------------------|-------------------------------------------------------------------|-------------------------------------------------------------------|
| Get Instruction from<br>Queue and Rename<br>Registers                                 | Execute and Watch CDB;                                            | Write on CDB;                                                     |
| Structural RSs check;<br>WAW and WAR solved<br>by Renaming<br>(!!!in-order-issue!!!); | Check for Struct on FUs;<br>RAW delaying;<br>Struct check on CDB; | (FUs will hold results unless<br>CDB free)<br>RSs/FUs marked free |





### Exe .1 Tomasulo: Code

```
I1: lw $f1, 0($r0)
I2: faddi $f1, $f1, C1
I3: faddi $f2, $f1, C2
I4: sw $f2, 0($r0)
I5: lw $f2, 4($r0)
I6: fadd $f2, $f2, $f2
I7: sw $f2, 4($r0)
```





### Exe .1 Tomasulo: Conflicts

```
RAW f1 I1-I2
               0(\$r0)
I1:
                                  RAW f1 I2-I3
     faddi
I2:
                                  RAW f2 I3-I4
     faddi($f), ($f)
I3:
                                  RAW f2 I5-I6
               0($r0)
I4:
                                  RAW f2 I6-I7
               4($r0)
I5:
                                  WAW f2 I5-I6
                 $f2)
                      $f2
I6:
     fadd
                                  WAW f2 I5-I3
17:
              4($r0)
                                  WAW f1 I2-I1
                                  WAW f2 I6-I3
                                  WAR f2 I4-I5
                                  WAR f2 I4-I6
```





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type | RSi | Unit |
|----------------------------------|-------|--------------|----|--------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      |       |              |    |              |     |      |
| I2:faddi \$f1, \$f1, C1          |       |              |    |              |     |      |
| I3:faddi \$f2, \$f1, C2          |       |              |    |              |     |      |
| <b>I4</b> :sw \$f2, 0(\$r0)      |       |              |    |              |     |      |
| I5:1w \$f2, 4(\$r0)              |       |              |    |              |     |      |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 |       |              |    |              |     |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      |       |              |    |              |     |      |

| RAW <b>f1</b> I1-I2 | <b>RAW f1 I2-I3</b> | RAW <b>f2</b> I3-I4 | RAW <b>f2</b> I5-I6 | RAW <b>f2</b> I6- |
|---------------------|---------------------|---------------------|---------------------|-------------------|





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type | RSi | Unit |
|----------------------------------|-------|--------------|----|--------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     |              |    |              | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          |       |              |    |              |     |      |
| I3:faddi \$f2, \$f1, C2          |       |              |    |              |     |      |
| <b>I4</b> :sw \$f2, 0(\$r0)      |       |              |    |              |     |      |
| <b>I5</b> :1w \$f2, 4(\$r0)      |       |              |    |              |     |      |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 |       |              |    |              |     |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      |       |              |    |              |     |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f**2 I3-I4 RAW **f**2 I5-I6





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type | RSi | Unit |
|----------------------------------|-------|--------------|----|--------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            |    |              | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     |              |    | RAW \$f1     | RS3 |      |
| I3:faddi \$f2, \$f1, C2          |       |              |    |              |     |      |
| <b>I4</b> :sw \$f2, 0(\$r0)      |       |              |    |              |     |      |
| I5:1w \$f2, 4(\$r0)              |       |              |    |              |     |      |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 |       |              |    |              |     |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      |       |              |    |              |     |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f2** I3-I4 RAW **f2** I5-I6





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type           | RSi | Unit |
|----------------------------------|-------|--------------|----|------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            |    |                        | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     |              | /  | RAW \$f1               | RS3 |      |
| I3:faddi \$f2, \$f1, C2          | 3     |              |    | RAW \$f1(struct ALU1)  | RS4 |      |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     |              |    | RAW \$f2 (struct LDU1) | RS2 |      |
| <b>I5</b> :1w \$f2, 4(\$r0)      |       |              |    |                        |     |      |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 |       |              |    |                        |     |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      |       |              |    |                        |     |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f**2 I3-I4 RAW **f**2 I5-I6





#### Exe 3.2 Tomasulo: CC5?

- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type           | RSi | Unit |
|----------------------------------|-------|--------------|----|------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            |    |                        | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     |              |    | RAW \$f1               | RS3 |      |
| I3:faddi \$f2, \$f1, C2          | 3     |              |    | RAW \$f1(struct ALU1)  | RS4 |      |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     |              |    | RAW \$f2 (struct LDU1) | RS2 |      |
| I5:1w \$f2, 4(\$r0)              |       |              |    |                        |     |      |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 |       |              |    |                        |     |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      |       |              |    |                        |     |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f2** I3-I4 RAW **f2** I5-I6





#### Exe 3.2 Tomasulo: CC5?

- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type           | RSi | Unit |
|----------------------------------|-------|--------------|----|------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            |    |                        | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     |              |    | RAW \$f1               | RS3 |      |
| I3:faddi \$f2, \$f1, C2          | 3     |              |    | RAW \$f1(struct ALU1)  | RS4 |      |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     |              |    | RAW \$f2 (struct LDU1) | RS2 |      |
| <b>I5</b> :1w \$f2, 4(\$r0)      |       |              |    | struct RS1             |     |      |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 |       |              |    |                        |     |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      |       |              |    |                        |     |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f2** I3-I4 RAW **f2** I5-I6





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type           | RSi | Unit |
|----------------------------------|-------|--------------|----|------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            | 6  |                        | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     |              |    | RAW \$f1               | RS3 |      |
| I3:faddi \$f2, \$f1, C2          | 3     |              |    | RAW \$f1(struct ALU1)  | RS4 |      |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     |              |    | RAW \$f2 (struct LDU1) | RS2 |      |
| I5:1w \$f2, 4(\$r0)              |       |              |    | struct RS1             |     |      |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 |       |              |    |                        |     |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      |       |              |    |                        |     |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f2** I3-I4 RAW **f2** I5-I6





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type           | RSi | Unit |
|----------------------------------|-------|--------------|----|------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            | 6  |                        | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     | 7            |    | RAW \$f1               | RS3 | ALU1 |
| I3:faddi \$f2, \$f1, C2          | 3     |              |    | RAW \$f1(struct ALU1)  | RS4 |      |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     |              |    | RAW \$f2 (struct LDU1) | RS2 |      |
| I5:lw \$f2, 4(\$r0)              | 7     |              |    | struct RS1             | RS1 |      |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 |       |              |    |                        |     |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      |       |              |    |                        |     |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f**2 I3-I4 RAW **f**2 I5-I6





# Exe 3.2 Tomasulo: CC8?

- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type                | RSi | Unit |
|----------------------------------|-------|--------------|----|-----------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            | 6  |                             | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     | 7            |    | RAW \$f1                    | RS3 | ALU1 |
| I3:faddi \$f2, \$f1, C2          | 3     |              |    | RAW \$f1(struct ALU1)       | RS4 |      |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     |              |    | RAW \$f2 (struct LDU1)      | RS2 |      |
| I5:lw \$f2, 4(\$r0)              | 7     |              |    | struct RS1 + struct<br>LDU1 | RS1 |      |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 |       |              |    | struct RS3                  |     |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      |       |              |    |                             |     |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f**2 I3-I4 RAW **f**2 I5-I6





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type                | RSi | Unit |
|----------------------------------|-------|--------------|----|-----------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            | 6  |                             | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     | 7            | 9  | RAW \$f1                    | RS3 | ALU1 |
| I3:faddi \$f2, \$f1, C2          | 3     |              |    | RAW \$f1(struct ALU1)       | RS4 |      |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     |              |    | RAW \$f2 (struct LDU1)      | RS2 |      |
| <b>I5</b> :1w \$f2, 4(\$r0)      | 7     |              |    | struct RS1 + struct<br>LDU1 | RS1 |      |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 |       |              |    | struct RS3                  |     |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      |       |              |    |                             |     |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f**2 I3-I4 RAW **f**2 I5-I6





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type                | RSi | Unit |
|----------------------------------|-------|--------------|----|-----------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            | 6  |                             | RS1 | LDU1 |
| <b>I2</b> :faddi \$f1, \$f1, C1  | 2     | 7            | 9  | RAW \$f1                    | RS3 | ALU1 |
| I3:faddi \$f2, \$f1, C2          | 3     | 10           |    | RAW \$f1(struct ALU1)       | RS4 | ALU1 |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     |              |    | RAW \$f2 (struct LDU1)      | RS2 |      |
| <b>I5</b> :1w \$f2, 4(\$r0)      | 7     |              |    | struct RS1 + struct<br>LDU1 | RS1 |      |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 | 10    |              |    | struct RS3                  | RS3 |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      |       |              |    |                             |     |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f**2 I3-I4 RAW **f**2 I5-I6





# Exe 3.2 Tomasulo: CC11?

- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type                | RSi | Unit |
|----------------------------------|-------|--------------|----|-----------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            | 6  |                             | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     | 7            | 9  | RAW \$f1                    | RS3 | ALU1 |
| I3:faddi \$f2, \$f1, C2          | 3     | 10           |    | RAW \$f1(struct ALU1)       | RS4 | ALU1 |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     |              |    | RAW \$f2 (struct LDU1)      | RS2 |      |
| <b>I5</b> :lw \$f2, 4(\$r0)      | 7     |              |    | struct RS1 + struct<br>LDU1 | RS1 |      |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 | 10    |              |    | struct RS3                  | RS3 |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      |       |              |    | struct RS2                  |     |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f2** I3-I4 RAW **f2** I5-I6





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type                           | RSi | Unit |
|----------------------------------|-------|--------------|----|----------------------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            | 6  |                                        | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     | 7            | 9  | RAW \$f1                               | RS3 | ALU1 |
| I3:faddi \$f2, \$f1, C2          | 3     | 10           | 12 | RAW \$f1(struct ALU1)                  | RS4 | ALU1 |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     |              |    | RAW \$f2 (struct LDU1)                 | RS2 |      |
| <b>I5</b> :lw \$f2, 4(\$r0)      | 7     |              |    | struct RS1 + struct<br>LDU1            | RS1 |      |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 | 10    |              |    | struct RS3 + RAW \$f2<br>(struct ALU1) | RS3 |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      |       |              |    | struct RS2                             |     |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f**2 I3-I4 RAW **f**2 I5-I6





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type                           | RSi | Unit |
|----------------------------------|-------|--------------|----|----------------------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            | 6  |                                        | RS1 | LDU1 |
| <b>I2</b> :faddi \$f1, \$f1, C1  | 2     | 7            | 9  | RAW \$f1                               | RS3 | ALU1 |
| I3:faddi \$f2, \$f1, C2          | 3     | 10           | 12 | RAW \$f1(struct ALU1)                  | RS4 | ALU1 |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     | 13           |    | RAW \$f2 (struct LDU1)                 | RS2 | LDU1 |
| <b>I5</b> :1w \$f2, 4(\$r0)      | 7     |              |    | struct RS1 + struct<br>LDU1            | RS1 |      |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 | 10    |              |    | struct RS3 + RAW \$f2<br>(struct ALU1) | RS3 |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      |       |              |    | struct RS2                             |     |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f**2 I3-I4 RAW **f**2 I5-I6





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type                           | RSi | Unit |
|----------------------------------|-------|--------------|----|----------------------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            | 6  |                                        | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     | 7            | 9  | RAW \$f1                               | RS3 | ALU1 |
| I3:faddi \$f2, \$f1, C2          | 3     | 10           | 12 | RAW \$f1(struct ALU1)                  | RS4 | ALU1 |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     | 13           | 17 | RAW \$f2 (struct LDU1)                 | RS2 | LDU1 |
| <b>I5</b> :1w \$f2, 4(\$r0)      | 7     |              |    | struct RS1 + struct<br>LDU1            | RS1 |      |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 | 10    |              |    | struct RS3 + RAW \$f2<br>(struct ALU1) | RS3 |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      |       |              |    | struct RS2                             |     |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f**2 I3-I4 RAW **f**2 I5-I6





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type                           | RSi | Unit |
|----------------------------------|-------|--------------|----|----------------------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            | 6  |                                        | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     | 7            | 9  | RAW \$f1                               | RS3 | ALU1 |
| I3:faddi \$f2, \$f1, C2          | 3     | 10           | 12 | RAW \$f1(struct ALU1)                  | RS4 | ALU1 |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     | 13           | 17 | RAW \$f2 (struct LDU1)                 | RS2 | LDU1 |
| I5:1w \$f2, 4(\$r0)              | 7     | 18           |    | struct RS1 + struct<br>LDU1            | RS1 | LDU1 |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 | 10    |              |    | struct RS3 + RAW \$f2<br>(struct ALU1) | RS3 |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      | 18    |              |    | struct RS2                             | RS2 |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f**2 I3-I4 RAW **f**2 I5-I6





# Exe 3.2 Tomasulo: CC19?

- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type                           | RSi | Unit |
|----------------------------------|-------|--------------|----|----------------------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            | 6  |                                        | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     | 7            | 9  | RAW \$f1                               | RS3 | ALU1 |
| I3:faddi \$f2, \$f1, C2          | 3     | 10           | 12 | RAW \$f1(struct ALU1)                  | RS4 | ALU1 |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     | 13           | 17 | RAW \$f2 (struct LDU1)                 | RS2 | LDU1 |
| <b>I5</b> :lw \$f2, 4(\$r0)      | 7     | 18           |    | struct RS1 + struct<br>LDU1            | RS1 | LDU1 |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 | 10    |              |    | struct RS3 + RAW \$f2<br>(struct ALU1) | RS3 |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      | 18    |              |    | struct RS2 + RAW \$f2<br>(struct LDU1) | RS2 |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f2** I3-I4 RAW **f2** I5-I6





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type                           | RSi | Unit |
|----------------------------------|-------|--------------|----|----------------------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            | 6  |                                        | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     | 7            | 9  | RAW \$f1                               | RS3 | ALU1 |
| I3:faddi \$f2, \$f1, C2          | 3     | 10           | 12 | RAW \$f1(struct ALU1)                  | RS4 | ALU1 |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     | 13           | 17 | RAW \$f2 (struct LDU1)                 | RS2 | LDU1 |
| <b>I5</b> :1w \$f2, 4(\$r0)      | 7     | 18           | 22 | struct RS1 + struct<br>LDU1            | RS1 | LDU1 |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 | 10    |              |    | struct RS3 + RAW \$f2<br>(struct ALU1) | RS3 |      |
| <b>I7</b> :sw \$f2, 4(\$r0)      | 18    |              |    | struct RS2 + RAW \$f2<br>(struct LDU1) | RS2 |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f**2 I3-I4 RAW **f**2 I5-I6





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type                           | RSi | Unit |
|----------------------------------|-------|--------------|----|----------------------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            | 6  |                                        | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     | 7            | 9  | RAW \$f1                               | RS3 | ALU1 |
| I3:faddi \$f2, \$f1, C2          | 3     | 10           | 12 | RAW \$f1(struct ALU1)                  | RS4 | ALU1 |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     | 13           | 17 | RAW \$f2 (struct LDU1)                 | RS2 | LDU1 |
| <b>I5</b> :1w \$f2, 4(\$r0)      | 7     | 18           | 22 | struct RS1 + struct<br>LDU1            | RS1 | LDU1 |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 | 10    | 23           |    | struct RS3 + RAW \$f2<br>(struct ALU1) | RS3 | ALU1 |
| <b>I7</b> :sw \$f2, 4(\$r0)      | 18    |              |    | struct RS2 + RAW \$f2<br>(struct LDU1) | RS2 |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f**2 I3-I4 RAW **f**2 I5-I6





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type                           | RSi | Unit |
|----------------------------------|-------|--------------|----|----------------------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            | 6  |                                        | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     | 7            | 9  | RAW \$f1                               | RS3 | ALU1 |
| I3:faddi \$f2, \$f1, C2          | 3     | 10           | 12 | RAW \$f1(struct ALU1)                  | RS4 | ALU1 |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     | 13           | 17 | RAW \$f2 (struct LDU1)                 | RS2 | LDU1 |
| <b>I5</b> :1w \$f2, 4(\$r0)      | 7     | 18           | 22 | struct RS1 + struct<br>LDU1            | RS1 | LDU1 |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 | 10    | 23           | 25 | struct RS3 + RAW \$f2<br>(struct ALU1) | RS3 | ALU1 |
| <b>I7</b> :sw \$f2, 4(\$r0)      | 18    |              |    | struct RS2 + RAW \$f2<br>(struct LDU1) | RS2 |      |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f**2 I3-I4 RAW **f**2 I5-I6





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type                           | RSi | Unit |
|----------------------------------|-------|--------------|----|----------------------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            | 6  |                                        | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     | 7            | 9  | RAW \$f1                               | RS3 | ALU1 |
| I3:faddi \$f2, \$f1, C2          | 3     | 10           | 12 | RAW \$f1(struct ALU1)                  | RS4 | ALU1 |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     | 13           | 17 | RAW \$f2 (struct LDU1)                 | RS2 | LDU1 |
| <b>I5</b> :1w \$f2, 4(\$r0)      | 7     | 18           | 22 | struct RS1 + struct<br>LDU1            | RS1 | LDU1 |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 | 10    | 23           | 25 | struct RS3 + RAW \$f2<br>(struct ALU1) | RS3 | ALU1 |
| <b>I7</b> :sw \$f2, 4(\$r0)      | 18    | 26           |    | struct RS2 + RAW \$f2<br>(struct LDU1) | RS2 | LDU1 |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f**2 I3-I4 RAW **f**2 I5-I6





- 2 RESERVATION STATIONS (RS1, RS2) + 1 LOAD/STORE unit (LDU1) with latency 4
- 2 RESERVATION STATIONS (RS3, RS4) + 1 ALU/BR FUs (ALU1) with latency 2

| Instruction                      | ISSUE | START<br>EXE | WB | Hazards Type                           | RSi | Unit |
|----------------------------------|-------|--------------|----|----------------------------------------|-----|------|
| <b>I1</b> :lw \$f1, 0(\$r0)      | 1     | 2            | 6  |                                        | RS1 | LDU1 |
| I2:faddi \$f1, \$f1, C1          | 2     | 7            | 9  | RAW \$f1                               | RS3 | ALU1 |
| I3:faddi \$f2, \$f1, C2          | 3     | 10           | 12 | RAW \$f1(struct ALU1)                  | RS4 | ALU1 |
| <b>I4</b> :sw \$f2, 0(\$r0)      | 4     | 13           | 17 | RAW \$f2 (struct LDU1)                 | RS2 | LDU1 |
| I5:1w \$f2, 4(\$r0)              | 7     | 18           | 22 | struct RS1 + struct<br>LDU1            | RS1 | LDU1 |
| <b>I6</b> :fadd \$f2, \$f2, \$f2 | 10    | 23           | 25 | struct RS3 + RAW \$f2<br>(struct ALU1) | RS3 | ALU1 |
| <b>I7</b> :sw \$f2, 4(\$r0)      | 18    | 26           | 30 | struct RS2 + RAW \$f2<br>(struct LDU1) | RS2 | LDU1 |

RAW **f1** I1-I2

**RAW f1 I2-I3** 

RAW **f**2 I3-I4 RAW **f**2 I5-I6







# Thank you for your attention Questions?

Alessandro Verosimile <alessandro.verosimile@polimi.it>

#### Acknowledgements

Davide Conficconi, E. Del Sozzo, Marco D. Santambrogio, D. Sciuto Part of this material comes from:

- "Computer Organization and Design" and "Computer Architecture A Quantitative Approach" Patterson and Hennessy books
- News and paper cited throughout the lecture

and are *properties of their respective owners* 



